Mount your google drive and download the dataset

This assignment is based off of this 2D object detection tutorial which uses pytorch to implement the SSD network in order to detect objects in images within the VOC Dataset. https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection

First we mount our google drive

This code only has to be run once. It creates the json files: TRAIN_images.json, TRAIN_objects and label_map.json. These are the image paths, ground truth object information and label to number mapping. This should take 1 hour and 12 minutes.

Create the VOC Dataset loader

Next the Dataset loader for VOC is implemented

Model Implementation

Base layers

First we create the base or encoder part of the network.

You must fill in the ResNet code.

Auxiliary layers

The base layers created the low level feature maps with 512 and 1024 features. Now the higher level feature maps are created for 512, 256, 256 and 256 feature maps.

Prediction layers

At this point we have our 6 feature maps.

The low level feature maps: (N, 512, 38, 38), (N, 1024, 19, 19)

Also the high level feature maps: (N, 512, 10, 10), (N, 256, 5, 5), (N, 256, 3, 3), (N, 256, 1, 1)

Each prior box requires a classification output of size number of classes and also the 4 box location values that are regressed. These convolutions are created in the init function.

In the forward pass all the convolutions are performed on their respective input feature maps. After that there is some work done to modify the tensors and then concatonate them in order to have the classification output shaped like (N, 8732, n_classes) and the box output to be (N, 8732, 4). This is a format that will be easier to work with when the network output is passed to the loss function during training or the output is passed through NMS during testing.

The SSD300 Model

init - Defines all network layers and created prior boxes

create_prior_boxes - Create 8732 prior boxes across the 6 feature maps

forward - Send the input data through the three network components and then return the predicted locations and classification scores.

detect_objects - After a forward pass the predicted objects can be sent to this function during testing in order to perform NMS for the final output.

Answer the follwowing questions after reading the NMS code and comparing it to the version in the lecture notes / tutorial.

  1. What variables within the batch_size for loop represent "D" and "$\bar{B}$"?

    • We know that in the NMS algorithm (consdier only one obejct in an image) D represents the bounding box that we want (the box with the highest objectiveness, the unsuppressed box), $\bar{B}$ represents all bounding boxes in this image sorted by their objectiveness score(descreasing order).
    • Compare with the relative code, we can find that D is "class_decoded_locs[1 - suppress]"; and $\bar{B}$ is "class_decoded_locs[sort_ind] or simply "class_decoded_locs"
  2. The NMS psuedo code is written with operations such as union and set subtraction. Within the NMS python code how are boxes selected in order to be added to the "D" output

    • Within the NMS python code, using "class_decoded_locs[1 - suppress]" to choose the non-suppress bounding box. (suppress equals to 1 means this box is supporessed, hence [1-supress] means this box is unsuppressed)

The MultiBoxLoss

During training the output from the SSD forward pass is then sent to the criterion (set to this function) in order to calculate the loss.

Training

With the model implemented it is time to train. Should take 2 hours and 9 minutes for 10 epochs.

Training SSD300 with VGG and the original learning rate adjuster

This can be run without making any changes to the code.

Training SSD300 with ResNet and the original learning rate adjuster

This should be run after implementing the ResNet Base.

Training SSD300 with VGG and using a PyTorch learning rate scheduler

This should be run after modifyng the training loop to use a learning rate scheduler.

Testing

Now let's run the eval code, it should take about 30 minutes per model.

Testing SSD300 with VGG and the original learning rate adjuster

Testing SSD300 with ResNet and the original learning rate adjuster

Testing SSD300 with VGG and using a PyTorch learning rate scheduler

Viewing results

And lastly let's view some images with our detections!